View the summary statistics of the dataset though this may not make sense as they are already described in the EDA report
Code
df.describe()
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
check the shape of the dataset
Code
df.shape
(91200, 10)
get the first and last year
Code
df['year'].agg(['min', 'max'])
year
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Food Price Comparison by Region and Year
For comparing average prices across regions for the same food item and year, the closing price is generally the best price to use. Here’s why:
Consistency: The closing price is considered the most representative of the market’s consensus for that period. It accounts for the entire trading session and reflects both supply and demand dynamics over time.
Standard Usage: The closing price is the most widely used price in financial markets, meaning data and analysis are typically focused around this price.
Simplicity: It eliminates the noise created by intra-day fluctuations, focusing on the price at the end of the trading session, which is more relevant for long-term comparisons.
Code
# Filter the dataframe to keep only rows where 'food_item' starts with 'c_'df_filtered = df[df['Food_Items'].str.startswith('c_')]# remove the 'c_' from the 'food_item' columndf_filtered.loc[:, 'Food_Items'] = df_filtered['Food_Items'].str.replace('c_', '', regex=True)df_filtered
Region
Province
Food_Items
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
# Create figurefig = go.Figure()# Add traces for each food item and statfood_items = df_filtered['Food_Items'].unique()stats = ['mean', 'median']for food in food_items:for stat in stats: df_food = df_filtered[df_filtered['Food_Items'] == food]for region in df_food['Region'].unique(): df_region = df_food[df_food['Region'] == region] fig.add_trace(go.Scatter( x=df_region['year'], y=df_region[stat], mode='lines+markers', name=f"{food} - {region} ({stat})", visible=False ))# Set initial visible traces (first food item, 'mean' stat)initial_food = food_items[0]initial_stat ='mean'for trace in fig.data:if initial_food in trace.name and initial_stat in trace.name: trace.visible =True# Create separate dropdown buttons for food items and statisticsfood_buttons = []stat_buttons = []for food in food_items: visibility = [food in trace.name and initial_stat in trace.name for trace in fig.data] food_buttons.append({'label': food,'method': 'update','args': [{'visible': visibility}, {'title': f"{initial_stat.capitalize()} for {food} by Region and Year"}] })for stat in stats: visibility = [initial_food in trace.name and stat in trace.name for trace in fig.data] stat_buttons.append({'label': stat,'method': 'update','args': [{'visible': visibility}, {'yaxis': {'title': stat.capitalize()}}, {'title': f"{stat.capitalize()} for {initial_food} by Region and Year"}] })# Update layout with separate dropdown menusfig.update_layout( title=f"{initial_stat.capitalize()} for {initial_food} by Region and Year", xaxis_title="Year", yaxis_title=initial_stat.capitalize(), template="plotly_white", updatemenus=[ {'buttons': food_buttons,'direction': 'down','showactive': True,'x': 0,'xanchor': 'left','y': 1.12,'yanchor': 'top', }, {'buttons': stat_buttons,'direction': 'down','showactive': True,'x': 0.28,'xanchor': 'left','y': 1.12,'yanchor': 'top', } ])# Show figurefig.show()
Figure 1: A line chart showing the mean and median price of each food item by region and year
Key Observation
The graph shows the mean (average) price and/or median price of different food items over time for different regions in the Philippines.
Trend - Generally, the price of all food items has been increasing over the years across most regions. There are some fluctuations and periods of stability for some items, but the overall trend is upward.
Regional Differences - There is significant variation in price of different food items across regions. Some regions consistently have higher prices than others. Price fluctuations also varies between regions. Some regions experience more volatile prices than others.
Key Reasons
Factors like weather patterns, government policies, agriculture changes, food transportation and global market trends can influence these prices.
Price Difference over the Years
Code
file_path ="../Data/Interim/cleaned_food_prices.csv"df_range = pd.read_csv(file_path)#dropping columns related to food price indexdf_nofpi_range = df_range.drop(columns=['o_food_price_index', 'h_food_price_index', 'l_food_price_index', 'c_food_price_index', 'inflation_food_price_index', 'trust_food_price_index'])# Convert 'Date' column to datetime formatdf_nofpi_range['Date'] = pd.to_datetime(df_nofpi_range['Date'])#dropping columns related to inflationdf_noinf_range = df_nofpi_range.drop(columns=['inflation_beans','inflation_cabbage', 'inflation_carrots', 'inflation_eggs', 'inflation_garlic', 'inflation_meat_beef_chops', 'inflation_meat_chicken_whole', 'inflation_meat_pork', 'inflation_onions', 'inflation_potatoes', 'inflation_rice', 'inflation_tomatoes'])#dropping columns related to trust scoresdf_cleaned_range = df_noinf_range.drop(columns=['trust_beans','trust_cabbage', 'trust_carrots', 'trust_eggs', 'trust_garlic', 'trust_meat_beef_chops', 'trust_meat_chicken_whole', 'trust_meat_pork', 'trust_onions', 'trust_potatoes', 'trust_rice', 'trust_tomatoes'])#dropping uneeded columnsdf_unneeded_range = df_cleaned_range.drop(columns=['country', 'City', 'lat', 'lon', 'Province', 'year', 'month'])# Reshaping from wide to long format (including year and month as part of the identifier)df_range = df_unneeded_range.melt(id_vars=['Region', 'Date'], var_name='Food_Items', value_name='Price')df_range_filtered = df_range[df_range['Food_Items'].str.startswith('c_')]df_range_filtered.loc[:, 'Food_Items'] = df_range_filtered['Food_Items'].str.replace('c_', '', regex=True)price_stats = df_range_filtered.groupby(['Date', 'Food_Items'])['Price'].agg(['min', 'max', 'mean']).reset_index()price_stats
Date
Food_Items
min
max
mean
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
def range_chart(price_stats):# Convert Date to datetime if it isn't already price_stats['Date'] = pd.to_datetime(price_stats['Date'])# Get unique food items for our dropdown food_items =sorted(price_stats['Food_Items'].unique())# Initialize our figure fig = go.Figure()# Create a list to store our dropdown menu options dropdown_buttons = []# For each food item, we'll create a filled range plotfor i, food_item inenumerate(food_items):# Filter data for this specific food item item_data = price_stats[price_stats['Food_Items'] == food_item]# Create a filled range plot fig.add_trace( go.Scatter( x=item_data['Date'], y=item_data['max'], name='Price Range', mode='lines', line=dict(width=0), showlegend=False, visible=(i ==0) # Only first item visible initially ) ) fig.add_trace( go.Scatter( x=item_data['Date'], y=item_data['min'], name=f'{food_item} Price Range', fill='tonexty', # Fill area between traces mode='lines', line=dict(width=0), fillcolor='rgba(68, 138, 255, 0.4)', # Semi-transparent blue visible=(i ==0) # Only first item visible initially ) )# Add a line for the mean price fig.add_trace( go.Scatter( x=item_data['Date'], y=item_data['mean'], name='Average Price', mode='lines', line=dict( color='rgb(31, 119, 180)', width=2 ), visible=(i ==0) # Only first item visible initially ) )# Create visibility list for this food item's traces# Each food item has 3 traces (upper bound, lower bound, and mean) visible_list = [False] * (len(food_items) *3) visible_list[i*3:(i*3)+3] = [True, True, True]# Add button for this food item to dropdown menu dropdown_buttons.append(dict( args=[{"visible": visible_list}, {"title": f"Price Range for {food_item} Over Time"}], label=food_item, method="update" ) )# Update layout with dropdown menu and other customizations fig.update_layout( updatemenus=[dict( buttons=dropdown_buttons, direction="down", showactive=True, x=0.05, y=1.15, xanchor="left", yanchor="top" ) ],# Add title and axis labels title=f"Price Range for {food_items[0]} Over Time", xaxis_title="Date", yaxis_title="Price", hovermode='x unified',# Improve layout appearance template="plotly_white", margin=dict(t=110), # Add top margin for dropdown# Improve legend layout legend=dict( orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1 ) )# Customize hover template to show min, max, and mean pricesfor trace in fig.data: trace.hovertemplate ='%{y:.2f}<extra></extra>'return fig# Example usage:fig = range_chart(price_stats)fig.show()
Code
price_stats['price_spread'] = price_stats['max'] - price_stats['min']def spread_chart(price_stats):# Get unique food items for dropdown food_items =sorted(price_stats['Food_Items'].unique())# Initialize figure figrange = go.Figure()# Create a list to store dropdown menu options dropdown_buttons = []for i, food_item inenumerate(food_items):# Filter data for this food item item_data = price_stats[price_stats['Food_Items'] == food_item]# Create the main line plot for price spread figrange.add_trace( go.Scatter( x=item_data['Date'], y=item_data['price_spread'], name='Price Spread', mode='lines+markers', line=dict(color='blue', width=2), marker=dict(size=6), visible=(i ==0), customdata=np.stack(( item_data['min'], item_data['max'], item_data['mean'] ), axis=-1), hovertemplate=('<b>Date</b>: %{x|%B %Y}<br>'+'<b>Price Spread</b>: ₱%{y:.2f}<br>'+'<b>Minimum Price</b>: ₱%{customdata[0]:.2f}<br>'+'<b>Maximum Price</b>: ₱%{customdata[1]:.2f}<br>'+'<b>Average Price</b>: ₱%{customdata[2]:.2f}<br>'+'<extra></extra>' ) ) )# Calculate statistics for annotations avg_spread = item_data['price_spread'].mean() max_spread = item_data['price_spread'].max() max_spread_date = item_data.loc[item_data['price_spread'].idxmax(), 'Date']# Add a dashed line for average spread figrange.add_trace( go.Scatter( x=[item_data['Date'].min(), item_data['Date'].max()], y=[avg_spread, avg_spread], name='Average Spread', mode='lines', line=dict(color='red', width=2, dash='dash'), visible=(i ==0) ) )# Create visibility list for this food item's traces visible_list = [False] * (len(food_items) *2) visible_list[i*2:(i*2)+2] = [True, True]# Add button for this food item to dropdown menu dropdown_buttons.append(dict( args=[ {"visible": visible_list}, {"title": f"Price Spread Analysis for {food_item}<br>" } ], label=food_item, method="update" ) )# Update layout figrange.update_layout( updatemenus=[dict( buttons=dropdown_buttons, direction="down", showactive=True, x=0.1, y=1.15, xanchor="left", yanchor="top" ) ], title=f"Price Spread Analysis for {food_items[0]}", xaxis_title="Date", yaxis_title="Price Spread in ₱", template="plotly_white", margin=dict(t=120, b=50), legend=dict( orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1 ),# Add annotation annotations=[dict( text="Price Spread = Maximum Price - Minimum Price", xref="paper", yref="paper", x=0, y=-0.15, showarrow=False, font=dict(size=10) ) ] )return figrangefigrange = spread_chart(price_stats)figrange.show()
Key Observations:
Overall Trend - Both the minimum and maximum prices show a general upward trend over the years, indicating an overall increase in the price of every food items. Some even recorded highest price increase on 2020, likely due to the COVID-19 pandemic and its impact on supply chains.
Price Range - The gap between the minimum and maximum prices for most items widens significantly in 2020 and 2021. This suggests that there were greater variations in prices across regions during these years.
Posssible causes:
COVID-19 Pandemic - The pandemic disrupted supply chains and increased demand for certain food items, which could have led to price volatility.
Regional Differences - Differences in the impact of the pandemic on regional economies and agricultural production could have contributed to price variations.
Regional Price difference
To truly know if the regions really have significant difference in prices over the years, hypothesis testing must be done. Please refer to .._testing.ipynb to know why Kruskall-Wallis was used here.
View the first few rows of the dataset
Code
#open the csv file that was already analyzed with kruskall wallis test.hypothesis_results ="../Data/Processed/hypothesis_testing_result.csv"hyp_df = pd.read_csv(hypothesis_results)hyp_df.head()
Year
Food_Item
H-statistic
p-value
Result
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Filter the dataframe to keep only rows where ‘food_items’ starts with ‘c_’ and remove prefix ’_c’
Code
# Filter the dataframe to keep only rows where 'food_item' starts with 'c_'hyp_df_filtered = hyp_df[hyp_df['Food_Item'].str.startswith('c_')]# removing prefix _c in the food itemshyp_df_filtered.loc[:, 'Food_Item'] = hyp_df_filtered['Food_Item'].str.replace('c_', '', regex=True)hyp_df_filtered.head()
Year
Food_Item
H-statistic
p-value
Result
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
check the the results via using table or check unique results
# Perform Dunn's Test for all food items and yearsdt_results = []for item in df_dt['Food_Items'].unique():for year in df_dt['year'].unique(): filter_dfdt = df_dt[(df_dt['Food_Items'] == item) & (df_dt['year'] == year)]ifnot filter_dfdt.empty: dunn_result = sp.posthoc_dunn(filter_dfdt, val_col='Price', group_col='Region') dunn_result['Food_Items'] = item dunn_result['year'] = year dt_results.append(dunn_result)# Combine results into a single DataFrame and save to CSVif dt_results: final_results = pd.concat(dt_results) final_results.to_csv("dunns_test_results.csv", index=True)print("Dunn's test results saved to dunns_test_results.csv")else:print("No valid comparisons found.")
Dunn's test results saved to dunns_test_results.csv
This heatmap visualizes the results of a Dunn’s post-hoc test for significant differences in prices of different food items between regions across the years. The color scale indicates the following: * Red (Close to 0): Indicates a statistically significant difference in food prices between two regions. * Light Green to Yellow (Around 0.2 to 0.8): Suggests a possible difference, but not statistically significant based on the chosen significance level (typically 0.05) * Dark Green (Close to 0): Indicates statistically no significant difference in bean prices between the two regions.
Key Observations
Significant Differences Exist: The presence of so many red cells indicates that there are statistically significant differences in food prices in almost all regions at any given years.
No Significant Differences: patches of green indicate pairs of regions where the Dunn’s test did not find a statistically significant difference in food prices, but there are only few of them.
Market Average Comparison: The “Market Average” row/column allows you to see how each region’s prices compare to the overall average. Dark red cells in this row/column would indicate regions with prices significantly different from the market average.
Food price Prediction 3 years into the future
Code
forecasts_df = pd.read_csv('forecasts.csv')def create_interactive_forecast_plot(forecasts_df):"""Create an interactive forecast plot using Plotly."""# Get unique regions and items regions =sorted(forecasts_df['region'].unique()) items =sorted(forecasts_df['item'].unique())# Create figure with secondary y-axis fig = make_subplots(specs=[[{"secondary_y": True}]])# Add traces for all combinationsfor region in regions:for item in items: forecast = forecasts_df[ (forecasts_df['region'] == region) & (forecasts_df['item'] == item) ].iloc[0]# Parse JSON strings historical_dates = json.loads(forecast['historical_dates']) historical_values = json.loads(forecast['historical_values']) forecast_dates = json.loads(forecast['forecast_dates']) forecast_values = json.loads(forecast['forecast_values'])# Historical data trace fig.add_trace( go.Scatter( x=historical_dates, y=historical_values, name="Historical", line=dict(color='blue'), visible=(region == regions[0] and item == items[0]) ), secondary_y=False )# Forecast data trace fig.add_trace( go.Scatter( x=forecast_dates, y=forecast_values, name="Forecast", line=dict(color='red', dash='dash'), visible=(region == regions[0] and item == items[0]) ), secondary_y=False )# Create dropdown buttonsdef create_visibility_array(selected_region, selected_item):return [ (r == selected_region and i == selected_item)for r in regionsfor i in itemsfor _ inrange(2) # Two traces per combination ]def get_arima_text(region, item): forecast = forecasts_df[ (forecasts_df['region'] == region) & (forecasts_df['item'] == item) ].iloc[0]return (f"ARIMA({forecast['p']},{forecast['d']},{forecast['q']})"f"({forecast['P']},{forecast['D']},{forecast['Q']},12)")# Region Dropdown region_buttons = [dict( args=[ {'visible': create_visibility_array(region, items[0])}, {'annotations': [dict(text="Region:", x=0, y=1.1, yref="paper", xref="paper", showarrow=False),dict(text="Food Item:", x=0.3, y=1.1, yref="paper", xref="paper", showarrow=False),dict( text=f"Note: ARIMA parameters used: {get_arima_text(region, items[0])}", x=0, y=-0.2, xref="paper", yref="paper", showarrow=False, font=dict(size=12, color="gray") ) ]} ], label=region, method='update' ) for region in regions ]# Item Dropdown item_buttons = [dict( args=[ {'visible': create_visibility_array(regions[0], item)}, {'annotations': [dict(text="Region:", x=0, y=1.1, yref="paper", xref="paper", showarrow=False),dict(text="Food Item:", x=0.3, y=1.1, yref="paper", xref="paper", showarrow=False),dict( text=f"Note: ARIMA parameters used: {get_arima_text(regions[0], item)}", x=0, y=-0.2, xref="paper", yref="paper", showarrow=False, font=dict(size=12, color="gray") ) ]} ], label=item, method='update' ) for item in items ] updatemenus = [# Region Dropdowndict( buttons=region_buttons, direction='down', showactive=True, x=0.05, xanchor='left', y=1.115, yanchor='top' ),# Item Dropdowndict( buttons=item_buttons, direction='down', showactive=True, x=0.435, xanchor='left', y=1.115, yanchor='top' ) ]# Initial ARIMA text initial_arima_text = get_arima_text(regions[0], items[0])# Update x-axis to show all years fig.update_xaxes( dtick="M12", # Show tick mark for every 12 months tickformat="%Y", # Show only the year tickangle=45, # Angle the labels for better readability showgrid=True, gridwidth=1, gridcolor='lightgray' )# Update layout with fixed dimensions and consistent annotations fig.update_layout( title='Price Forecast by Region and Food Item', xaxis_title='Date', yaxis_title='Price', updatemenus=updatemenus, annotations=[dict(text="Region:", x=0, y=1.1, yref="paper", xref="paper", showarrow=False),dict(text="Food Item:", x=0.4, y=1.1, yref="paper", xref="paper", showarrow=False),dict(text=f"Note: most optimized ARIMA parameters: {initial_arima_text}", x=0, y=-0.2, xref="paper", yref="paper", showarrow=False, font=dict(size=12, color="gray") ) ], height=563, # Fixed height width=1000, # Fixed width showlegend=True, legend=dict( yanchor="top", y=0.99, xanchor="left", x=0.95 ), margin=dict(t=150, b=120) # Add margin to accommodate dropdown and annotation )return fig# Create and display the interactive plotfig = create_interactive_forecast_plot(forecasts_df)fig.show()
Key Observations
Historical Price Volatility: The blue line representing historical prices shows significant fluctuations, especially noticeable spikes around 2008 and smaller ones in other years. This indicates that most food prices have been subject to considerable volatility over the past two decades.
Recent Price Increase: There’s a clear upward trend in food prices in the period leading up to the forecast (roughly from 2020 to 2024). This suggests recent factors have been pushing prices higher.
Relatively Stable Forecast: The red dashed line, representing the forecast for the next 36 months, shows a relatively stable and slightly upward trend. The model predicts a continued but moderated increase compared to the recent past.
Key Observations
Historical Volatility:
Weather Patterns:adverse weather events in key agricultural regions can significantly impact supply and cause price volatility.
Changes in Demand: Fluctuations in global demand, driven by population growth or economic factors, can also contribute to price swings.
Recent Price Increase:
Inflation: General inflation in the economy could be pushing up agricultural commodity prices.
Supply Chain Issues: Disruptions to global supply chains (e.g., due to pandemics, conflicts, or trade policies) can lead to shortages and price increases.
Increased Input Costs: Rising costs of fertilizers, fuel, and labor can increase the cost of food production, which could be passed on to consumers.
Relatively Stable Forecast:
Model Assumption: ARIMA models assume that, to some extent, future trends will resemble past patterns. The stable forecast suggests the model has not picked up any strong upward or downward pressures for the next 36 months beyond what is already captured in the recent trend.